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ABSTRACT 

The Integrated Moving Average (IMA) model of time 
series, and the analysis of intervention effects based on it, assume * 
random shocks which ar^> normally distributed. To determine the 
robustness of the analysis to violations of this assumption, 
empirical sampling methods were employed. Samples were generated from 
three populations; normal, moderately non^normal, and severely 
non-normal. The samples were combined with values of other quantities 
in the model, the resulting "observations" subjected to time-series 
analysis, and the effect on empirical significance levels noted. The 
analysis of interventions based on the IMA model was robust to 
violations of tho normality assumption. (Author) 
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Abstract 



The Integrated Moving Average (IMA) model of time 
series , and the analysis of intervention effects based on 
it 5 assume random shocks which are normally distributed. 



of this assumption, empirical sampling methods were employed. 
Samples were generated from three populations; normal, 
moderately non-normal, and severely non-normal. The samples 
were combined with values of other quantities in the model, 
the resulting "observations" subjected to time-series analy- 
sis, and the effect on empirical significance levels noted. 

The analysis of interventions based on the IMA model 
was robust to violations of the normality assumption. ♦ 



To determine the robustness of the analysis to violations 




AERA Convention 
April 1975 
Washington, D.C. 



THE EFFECT OF NON-NORMAL DISTRIBUTIONS 
ON THE INTEGRATED MOVING AVERAGE MODEL 
OF TIME- SERIES ANALYSIS 

Judith Doerann-George^ 
Indiana University 

Model building and testing have performed an important 
role in the development of theories, and through theories in 
the advancement of knowledge. A serious consideration in the 
use of models in research is the degree to which it is possible 
in the research situation to conform to the assumptions inherent 
in the model. If it can be determined that the outcome of 
statistical tests .based on the model are not altered by depar- 
ture from the conditions or assumptions, the model is said to 
be robust. 

The Integrated Moving Average (IMA) model, and statistical 
methods for estimating intervention effects developed by Box 
and Tiao (1965), promise to be useful in time-series analysis. 
They are, however, based on assumptions of normality ar..!^ homo- 
geneity of variance of the random shocks affecting the system. 
This study was designed to determine the robustness of the IMA 
model and related analysis to non-normality of the shocks. 

"''The author wishes to acknowledge the contributions of Profs. 
Vernon L. Hendrix and Jon Morris, both of the University of ^ 
Minnesota, in the conceptualization of this study, and of David 
Garrett, Indiana University, in computer programming. 

■^Inquiries concerning this paper may be directed to Judith 
Doerann-George, Education 253, Indiana University, Bloommgton, 
Indiana 47401 
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In the time-series quasi-experiment , observations of a 
variable are made at equally spaced time intervals. It is 
desired to make inferences regarding an alteration in the 
series associated with the introduction of an event or treat- 
ment at some point in the series. The alteration may be either 
a change in level or in drift of the series. The Integrated 
Moving Average model and related analytic procedures can be 
used in the situation described, because they provide a statis- 
tical test for changes in level and in drift of the series, 
while allowing for the nonindependence of the observations. 
Furthermore, the IMA model allows the variable observed to be 
the property of a system imbedded in 'white noise' or subject 
to random shocks. These shocks may be absorbed in the system 
over time. 

The expression in the IMA model for the n-j^ pretreatment 
observations (z^) is: 

t-1 

= L + YP(t-l) + y + Yjii + \- (1) 

For the observations following intervention the»expres- 
sion is : 

= L + YP(t-l) + p + YA(t-ni-l) + A + 6 + y-I^ a. + a^ , (2) 

where y = the drift characteristic of the series 

L = the initial level of the series when observations begin 
A = the change in drift of the series , due to interyention 
6 = the change in level of the series , also due to inter- 
vention 

Y = the interdependence parameter, equivalent to 1- 

proportion of shocks carried over to the following 
observation 
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A given observation or data value is considered to be a 

linear function of the four parameters A, L, and 6, and a valvs 

of a 5 a random normal variable with variance (Tiao in 

t 

Glass and Maguire , 1968). In order to determine the parameter 
values from the observations of a system, the z^'s are first 
transformed for a given y: 

y = <3) 

1 1 

= z^ - z^_^^ + ^^"'Y^Vt-i for t = 2, , ^1+^2 

The vector of y's which results can be expressed by the linear 
model 

y = X0 + e (5) 

where X is the design matrix, 0'^ the vector of parameters (y. A, 

L, 6) and e the vector of values of the random normal variable 

2 

a. with variance 0^ . 

Least-squares estimates of the four parameters are obtained, 
and (based on the assumption of normality of a^ and sampling 
theory) the following distributional statements can be made: 

t - ^ -t ^ L - L 



t 



A-A t V " Q 



s Vc^ s^Y^ 

a 

where Q?^ is the j"^^ diagonal element of the matrix (x'^X) 

The t-ratios (estimate/standard error) can be used to 
determine the probability of intervention effects, in the cases 
of 6 and A. T-ratios are also calculated for L and 



Procedures 

To test the robustness of the IMA model under violations 
of the normality assumption, it waL^ necessary to obtain data 
that were non-normal and to subject such data to time-series 
analysis. The origin of the data is not important to the 
problem, so long as they conform to the characteristics needed 
(Hammersley and Handscomb, 1964). Consequently, simulation 
techniques were used to generate the random shock values a^» 

Three populations of random shocks were selected which 
had desired degrees of skewness and kurtosis; one normal, one 
moderately non-normal, and one severely non-normal. The 
binomial n, p parameters which characterize each of the three 
populations were determined by a recently developed technique 
(Martin and Hendrix, 19 74). Each of the three n, p pairs thus 
determined was used with values from a random number generator 
to create 1000 samples of 60 each. The samples were standard- 
ized to mean zero and standard deviation one, before use in 
the time series. (Such standardization does not aff ect ^he value 
of skewness or kurtosis). The actual skewness and kurtosis 
of these samples compared well with the skewness and kurtosis 
as calculated from binomial values n and p. See Table 1. 



TABLE 1: Comparison of Desired and Actual Measures of Skew- 
ness (3^) and Kurtosis (32) 



Population 











I 


II 


III 


Binomial 
Value 


Input 
n 




100 


40 


40 




P 






.5 


.9870 


.9958 


Desired 


o 

Pi 


- vq-p^ 
npq 




0.0000 


1. 8484 


5. 8775 






= l-6pq + 3 




2. 9800 


4. 7984 


8. 8275 






npq 










Actual 


Pi 


9 it 

M 




0.0000 


1.7641 


5.6020 




^2 


= - 
"2 




2.9881 


4. 8082 


8.6928 


^;M. = 

1 


n 
E 
i = l 


(V , n 




L, 2, 


60,000 








n 










Four 


values of the 




nterdependence parameter 


in time 


series , 


and 


three values 


of 


6 , y , and A 


, other time 


-series 



parameters, were selected and varied systematically for each 

simulation run. The values of y were 0.1, O.S, 1.0, and 1.5, 

which adequately represent the range of y values found in 

practice. For the other parameters, values of 0.5, 0.0, and 

►^1 
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-0*5 were selected to provide tests of the null condition and 
to make it unnecessary to use supply distributions with posi- 
tive and negative skex^ness. A few trial runs indicated to the 
investigator that 0.5 would be a reasonable magnitude of treat- 
ment effect. L, the initial level of the time series, was 
maintained at zero throughout the study. 

Combinations of the four values of Yj and three each of 
6, and A yielded 108 parameter sets, and a different para- 
meter set was used in each simulation run. A computer simula- 
" tion run consisted of the generation of random shock samples 
from one population which were then combined with a set of 
time-series parameter values to create 'observations' of a 
time series. Each of 1000 sets of 60 observations were 
subjected to time-series analysis and the resulting t's 
Rallied. The entire process was repeated with samples from 
the remaining two supply distributions. 

More specifically, the 60 values in a sample from one 
of the three populations were combined with input values, of 
Y, 5, and A (L=0) according to the linear model of time 
series (equations 1 and 2) to yield 30 pre-intervention and 
30 post-intervention data values or observations. The data 
set was analyzed, using a program based on one developed by 
Glass and Maguire (1958). From a data set and the true, or 
input, value of y the following were computed: least-squares 
estimates of y, A, L and 6, the standard error of each 
estimate, and four values of t obtained by dividing each 
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estimate by its standard error. Consequently, each of the 

10 8 simulation runs produced 12,000 t's, 1000 for each of four 

parameters and the three populations . 

The 12 distributions of 1000 t's each vere compared to 

the t-distribution with n-4=56 df to determine whether actual 

or empirical significance levels differed from nominal (.10, 

.0 5, and .01) ones. This was done by scanning the 1000 t's 

for a single supply distribution and a single parameter, and 

tallying the number of t's more extreme than the critical 

. t values of ±1.671, ±2.000, and ±2.600. The entire process 
a/ 2 

, of data generation, parameter estimation, and tally of the 
12 distributions of 1000 t's was executed with all of the 
108 input parameter sets. 

Results and Conclusion 

Each of the 108 simulation runs yielded 36 empirical 
significance levels, one for each combination of the three 
populations, four estimated parameters, and three nominal 
significance levels. To condense those results, the res\ilts 
from similar input conditions we^e combined. 

The four values of y were used for 2 7 runs each. The 
-.5, 0.0, and +.5 values of y. A, and 6 were employed in the 
27 possible combinations, once with each y value. L, the 
initial series level, was maintained at zero for all 108 runs. 

Consequently, with each y value, results were obtained 
for 27 runs in which L was zero; there were nine when y=-.5, 
nine when y=0.0, and nine for which y=+.5. Similarily, there 

ERIC 
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were nine runs each in which the input values of A and of 6 
were -.5, 0.0, and +.5. To summarize overall trends, the 
empirical significance levels for a single parameter which 
were obtai^ned with the same input value of that parameter were 
averaged, separately for each y value. 

l^hen the null hypothesis was true for all or some of the 
parameters A, L and 6 (see Table 2, Appendix), the propor- 
tion of t-ratios more extreme than ("the critical t for 
two-tailed nominal alpha level a) was close to the nominal 
level of significance for all supply distributions, especially 
when Y = -Ol* At the higher values of Yj 0.5, 1.0, and 1.5, 
non-normality of the random shock supply distribution did seem 
to broaden the extreme tails of the t-distributions ; at a = .01, 
probability of Type I error usually increased with increasing 
non-normality of population, especially for L and 6. This 
effect was slight for p, the series drift. Actual significance 
levels associated with zero input values of A, the change in 
drift, were not affected in any systematic way by non-normality 
of population. 

When p, 5 A, and/or 6 input values were ±.5 (see Tables 3 
and 4, Appendix), nominal and actual significance levels were 
very similar when Y = -01, which indicates that non-zero drift 
(y) or treatment effects (A and 6) would be difficult to detect 
regardless of population. At higher levels of y> non-zero 
input values showed an effect in increased empirical signifi- 
cance level obtained; this is particularly marked for y, 
moderate for A, and least severe for 6. (See Figure 1 for 

10 
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Y 1.0 and input values = -.5c) There was a tendency for 
actual significance levels to increase with non-normality of 
supply distribution. However observable trends were not 
consistent for all three parameters across y levels or at all 
a levels. In all cases, any variation apparently associated 
with non-normality of supply distribution was small compared 
to the change in significance level due to non-zero input 
values of A, and 6. 
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FIGURE 1: Mean Actual Significance Levels for Parameters 
y. A, and 6 by Alpha Level for Three Populations when 
Y = 1.0 and Input Values = -.5 
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It can be concluded that the analysis of time series 
based on the IMA model is robust to violations of the assump- 
tion of normality of the randc.a shock population, particularly 
at alpha levels of .10 and .05. There does appear to be a 
slight increase in probabili uy of Type I error with non- 
normality of shocks at a = .01. However, when the null 
hypothesis was not true, the probability of rejection of the 
hypothesis was not affected consistently by population type, 
and differences in rejection probability due to population 
type were negligible in comparison to actual significance level 
magnitude due to treatment effect (A, 6) or series drift (y). 

Variations apparently due to population type were not 
consistent enough nor of sufficient magnitude to warrant 
concern regarding random shock normality when setting alpha 
levels. In an experimental situation, of course, it would 
not be known whether the null hypothesis regarding A and S 
is true, since those are possible ti»eatment effects to i)e 
detected by the analysis. If there is sufficient data on 
the system being studied, the probable values of L and y 
could be known, since those are characteristic of a given 
series. Consequently, it seems justifiable to recommend that 
possible non-normality of random shocks not be of primary 
concern to a researcher who is choosing alpha levels for a 
particular experiment employing the IMA model in time-series 
analysis. 
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TABLE 2: Empirical Significance Levels for Four 
Parameters, Averaged Over All Cases of Zero Input Value 
of Each Parameter 



ERIC 



V 
1 


ParamelieT 


No. of 
Runs 
Averaged 


Popu- 
lation 


a = .10 


a = .05 


a = .01 




.01 




9 


I 
II 
III 


.093 
.099 
.099 


.050 
.048 
.049 


.011 
.008 
.009 




, Ul 


A 


q 


T 

X 

II 
III 


096 
.098 
.102 


.050 
.046 
.048 


.009 
.009 
.008 




01 


r 

u 


27 


I 

II 
III 


.095 
.099 
.098 ■ 


.050 
.049 
.051 


.011 
.009 
.011 






V 


Q 


I 

II 
III 


.100 
.095 
.100 


.053 
.050 
.049 


.010 
.009 
.009 




■ J 


1 1 

M 


9 


I 

II 
III 


.100 
.102. 
.102 


.051 
.052 
.050 


.010 
.013 
.020 




c 
• »> 


A 


Q 


T 
X 

II 
III 


096 
.099 
.099 


.052 
.052 
.044 


.010 
.009 
.007 




c 
• »> 


1 

Li 


27 


I 

II 
III 


.101 
.077 
.092 


.049 
.049 
.061 


.010 
• .018 
.032 




• 5 


0 


Q 


T 

II 
III 


.072 
.091 


050 
.046 
.062 


.008 
.017 
• 032 




1.0 




9 


I 
II 
III 


. lUb 
.104 
.114 


.053 
.065 


.013 
.017 




1.0 


A 


9 


T 
1 

II 
III 


1 nn 
.103 
.101 


.054 
.049 


ni n 
.009 
.008 




1.0 


L 


27 


I 
II 
III 


.103 
.081 
.132 


.052 
.057 
.091 


.014 
.021 
.036 


14 


1.0 


6 


9 


I 
II 
III 


.105 
.089 
.131 


.052 
.067 
.092 


.010 
.024 
.038 
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TABLE 2 (Continued) 















No. of 










Y 


Parameter 


Runs 


Popu- 


a = .10 


a = . 05 


a = .01 






Averaged 


lation 








1.5 


y 




I 


105 


.054 


.010 








II 




053 


.012 








III 


.110 


.059 


.018 


1.5 


A 


Q 


I 


099 


.052 


.010 








II 




054 


.009 








III 


.096 


.045 


.007 


1.5 


L 


27 


T 


.103 


.051 


.010 








II 


.096 


.057 


.019 








III 


.125 


.081 


.029 


1.5 


6 


9 


I 


.101 


.050 


.009 








II 


.096 


.055 


.018 








III 


.126 


.079 


.029 
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TABLE 3: Empirical Significance Levels for Three 
Parameters, Averaged Over All Cases of -.5 Input Value 
of Each Parameter 
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No. of 






a = .05 


a = .01 


Y 


Parameter 


Runs 


Popu- 


n = in 






Averaged 


lation 








.01 


V 




T 

1 


1 OR 


.054 


.012 






T T 


1 nfi 


• KJDO 


009 








III 


.109 


,057 


.013 


.01 


A 




T 
X 


101 


049 


.008 








TT 
11 


• xuo 




012 

. \/X mm 








III 


.105 


.050 


.009 


.01 


0 


Q 

y 


T 
X 


105 


.050 


.009 






TT 
X X 




050 
• \j ^ \j 


.010 








III 


.106 


.054 


.009 


• 5 




y 


T 
X 


S2S 


.733 


.490 






TT 
XX 


• Of o 




.473 








III 


.872 


.766 


.471 


iS 


A 


Q 

y 


T 
X 


549 


.429 


.218 






TT 
X X 


• D/ X 




.227 








III 


.583 


.461 


.249 


.5 


6 


y 


T 

X . 


156 


.086 


.023 






T T 

X X 


1 Aft 
• Xf o 


100 


.043 








III 


.152 


.115 


.055 


1.0 


y 


y 


T 
X 


843 


.754 


.518 








T T 
11 


. • O / X 


772 


.512 








III 


.903 


.795 


^ .512 


1-0 


A 


y 


T 
X 


594 


.462 


.235 








II 


• Dyy 












III 


.617 


.492 


.274 


1 n 


fi 


9 


I 


.142 


.081 


.020 








II 


. .115 


.090 


.049 








III 


.157 


.140 


.079 




11 

M 


9 


I 


.857 


.774 


.541 








II 


.877 


.788 


.527 








III 


.915 


.822 


.538 


1.5 


A 


9 


I 


.597 


.465 


.243 








II 


.609 


.483 


.252 








III 


.623 


.499 


.276 


1.5 


5 


9 


I 


.157 


.090 


.025 








II 


.146 


.097 


.045 








III 


.147 


.125 


.067 



TABLE 4: Empirical Significance Levels for Three 
Parameters, Averaged Over All Cases of + . 5 Input Value 
of Each Parameter 







No. of 










Y 


Parameter 


Runs 


Popu- 


a = .10 


a = .05 


a = .01 








X CL C X V 1 1 








.01 


V 


9 


I 


.104 


.053 


.013 








TT 
11 


. X lo 




01 \ 
. uxo 








III 


.119 


.058 


.015 


.01 


A 


9 


I 


.105 


.051 


.010 








TT 
X X 


106 


052 


009 








III 


.105 


.053 


.009 


.01 


5 


9 


I 


.099 


.050 


.010 








TT 

X X 


104 


■ 052 


.011 








III 


.106 


.055 


.011 


.5 


V 


9 


I 


.816 


.715 


.469 








II 


.809 


.723 


.502 








III 


.796 


.714 


.526 


.5 


& 


9 


I 


.569 


.449 


.216 








TI 

X X 


.580 


.453 


.228 






■ 


III 


.588 


.470 


.258 


.5 


5 


9 


I 


.149 


.085 


.033 








TI 

X X 


.138 


.040 


.009 








III 


.067 


.035 


.018 


1.0 


W 


9 


I 


.849 


.754 


.517 








II 


.819 


.739 


.534 




* 




III 


.811 


.726 


* .535 


1.0 


A 


9 


I 


.590 


.467 


.231 








TT 
X X 


607 


480 


257 








III 


.624 


.496 


.285 


1.0 


5 


9 


I 


.143 


.078 


.021 








TT 
X X 




09 


01 0 








III 


.068 


.036 


.019 


1.5 


M 


9 


I 


.844 


.758 


.518 








II 


.830 


.748 


.541 








III 


.824 


.743 


.557 


1.5 


A 


9 


I 


.610 


.478 


.242 








II 


.610 


.490 


.253 








III 


.621 


.505 


.285 


1.5 


6 


9 


I 


.156 


.085 


.022 








II 


.146 


.067 


.014 








III 


.149 


.089 


.023 



